The Latent Structure of Dictionaries

نویسندگان

  • Philippe Vincent-Lamarre
  • Alexandre Blondin Massé
  • Marcos Lopes
  • Mélanie Lord
  • Odile Marcotte
  • Stevan Harnad
چکیده

How many words-and which ones-are sufficient to define all other words? When dictionaries are analyzed as directed graphs with links from defining words to defined words, they reveal a latent structure. Recursively removing all words that are reachable by definition but that do not define any further words reduces the dictionary to a Kernel of about 10% of its size. This is still not the smallest number of words that can define all the rest. About 75% of the Kernel turns out to be its Core, a "Strongly Connected Subset" of words with a definitional path to and from any pair of its words and no word's definition depending on a word outside the set. But the Core cannot define all the rest of the dictionary. The 25% of the Kernel surrounding the Core consists of small strongly connected subsets of words: the Satellites. The size of the smallest set of words that can define all the rest-the graph's "minimum feedback vertex set" or MinSet-is about 1% of the dictionary, about 15% of the Kernel, and part-Core/part-Satellite. But every dictionary has a huge number of MinSets. The Core words are learned earlier, more frequent, and less concrete than the Satellites, which are in turn learned earlier, more frequent, but more concrete than the rest of the Dictionary. In principle, only one MinSet's words would need to be grounded through the sensorimotor capacity to recognize and categorize their referents. In a dual-code sensorimotor/symbolic model of the mental lexicon, the symbolic code could do all the rest through recombinatory definition.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On multiword lexical units and their role in maritime dictionaries

Multi-word lexical units are a typical feature of specialized dictionaries, in particular monolingual and bilingual maritime dictionaries. The paper studies the concept of the multi-word lexical unit and considers the similarities and differences of their selection and presentation in monolingual and bilingual maritime dictionaries. The work analyses such issues as the classification of multi-w...

متن کامل

COS 598C: Detecting overlapping communities, and theoretical frameworks for learning deep nets and dictionaries

Today we present some ideas for provable learning of deep nets and dictionaries, two important (and related) models. The common thread is a simple algorithm for detecting overlapping communities in networks. While community detection is typically thought of as a way to discover structure in, say, large social networks, here we use as a general purpose algorithmic tool to understand structure of...

متن کامل

A Survey on Latent Fingerprint Enhancement via Image Decomposition Techniques

The Latent fingerprints are used for criminal identification which is lifted from surfaces of objects at crime scenes play a role in identifying aspects in the crime scene investigations. Latent fingerprint images are usually of poor quality with unclear ridge structure and various overlapping patterns. Latent fingerprints, or simply latent, have been considered as cardinal evidence for identif...

متن کامل

Medical Students’ Perception of Using Electronic Learning Tools in an ESP Program

Given the burgeoning interest in the use of technology and electronic tools for educational purposes among students, this study set out with the purpose of investigating medical students’ perception of using e-learning tools and applications in an English for Specific Purposes (ESP) program at an Iranian medical university. The study also aimed to discover the extent to which the students...

متن کامل

A Patent Document Retrieval System Addressing Both Semantic And Syntactic Properties

Combining the principle of Differential Latent Semantic Index (DLSI) (Chen et al., 2001) and the Template Matching Technique (Tokuda and Chen, 2001), we propose a new user queries-based patent document retrieval system by NLP technology. The DLSI method first narrows down the search space of a sought-after patent document by content search and the template matching technique then pins down the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Topics in cognitive science

دوره 8 3  شماره 

صفحات  -

تاریخ انتشار 2016